Predicting breast cancer survivability: a comparison of three data mining methods

نویسندگان

  • Dursun Delen
  • Glenn Walker
  • Amit Kadam
چکیده

OBJECTIVE The prediction of breast cancer survivability has been a challenging research problem for many researchers. Since the early dates of the related research, much advancement has been recorded in several related fields. For instance, thanks to innovative biomedical technologies, better explanatory prognostic factors are being measured and recorded; thanks to low cost computer hardware and software technologies, high volume better quality data is being collected and stored automatically; and finally thanks to better analytical methods, those voluminous data is being processed effectively and efficiently. Therefore, the main objective of this manuscript is to report on a research project where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. METHODS AND MATERIAL We used two popular data mining algorithms (artificial neural networks and decision trees) along with a most commonly used statistical method (logistic regression) to develop the prediction models using a large dataset (more than 200,000 cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. RESULTS The results indicated that the decision tree (C5) is the best predictor with 93.6% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), artificial neural networks came out to be the second with 91.2% accuracy and the logistic regression models came out to be the worst of the three with 89.2% accuracy. CONCLUSION The comparative study of multiple prediction models for breast cancer survivability using a large dataset along with a 10-fold cross-validation provided us with an insight into the relative prediction ability of different data mining methods. Using sensitivity analysis on neural network models provided us with the prioritized importance of the prognostic factors used in the study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries

The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influencial in cancer research. This research presents a comparison of three data mining classification models: multi-layer perceptron neural networks, C4.5 decision trees and Naive Bayes. The classification models are built for breast cancer survivability predic...

متن کامل

Optimal Data Mining Method for Predicting Breast Cancer Survivability

Breast cancer is one of leading causes of death. This study predicts 5-year survivability of breast cancer patients by two data mining techniques. The data set consisted of information about patients who have cancer diagnosis collected by SEER. In this study, data set is pre-classified into survival and non-survival with 90.66% and 9.34%, respectively. The selected variables used to predict 5-y...

متن کامل

Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability

Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. ‎In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set‎. ‎Therefore‎, ‎developing a machine for p...

متن کامل

Using data mining techniques for predicting the survival rate of breast cancer patients: a review article

    This review was conducted between December 2018 and March 2019 at Isfahan University of Medical Sciences. A review of various studies revealed what data mining techniques to predict the probability of survival, what risk factors for these predictions, what criteria for evaluating data mining techniques, and finally what data sources for it have been used to predict the surv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Artificial intelligence in medicine

دوره 34 2  شماره 

صفحات  -

تاریخ انتشار 2005